Clustering with Beta Divergences
نویسندگان
چکیده
Clustering algorithms start with a fixed divergence, which captures the possibly asymmetric distance between a sample and a centroid. In the mixture model setting, the sample distribution plays the same role. When all attributes have the same topology and dispersion, the data are said to be homogeneous. If the prior knowledge of the distribution is inaccurate or the set of plausible distributions is large, an adaptive approach is essential. The motivation is more compelling for heterogeneous data, where the dispersion or the topology differs among attributes. We propose an adaptive approach to clustering using classes of parametrized Bregman divergences. We first show that the density of a steep exponential dispersion model (EDM) can be represented with a Bregman divergence. We then propose AdaCluster, an expectation-maximization (EM) algorithm to cluster heterogeneous data using classes of steep EDMs. We compare AdaCluster with EM for a Gaussian mixture model on synthetic data and nine UCI data sets. We also propose an adaptive hard clustering algorithm based on Generalized Method of Moments. We compare the hard clustering algorithm with k-means on the UCI data sets. We empirically verified that adaptively learning the underlying topology yields better clustering of heterogeneous data.
منابع مشابه
On Clustering Histograms with k-Means by Using Mixed α-Divergences
Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retr...
متن کاملFamilies of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities
In this paper, we extend and overview wide families of Alpha-, Betaand Gammadivergences and discuss their fundamental properties. In literature usually only one single asymmetric (Alpha, Beta or Gamma) -divergence is considered. We show in this paper that there exist families of such divergences with the same consistent properties. Moreover, we establish links and correspondences among these di...
متن کاملTotal Jensen divergences: Definition, Properties and k-Means++ Clustering
We present a novel class of divergences induced by a smooth convex function called total Jensendivergences. Those total Jensen divergences are invariant by construction to rotations, a feature yieldingregularization of ordinary Jensen divergences by a conformal factor. We analyze the relationships be-tween this novel class of total Jensen divergences and the recently introduced tota...
متن کاملNon-flat Clusteringwhith Alpha-divergences
The scope of the well-known k-means algorithm has been broadly extended with some recent results: first, the kmeans++ initialization method gives some approximation guarantees; second, the Bregman k-means algorithm generalizes the classical algorithm to the large family of Bregman divergences. The Bregman seeding framework combines approximation guarantees with Bregman divergences. We present h...
متن کاملAlpha/Beta Divergences and Tweedie Models
We describe the underlying probabilistic interpretation of alpha and beta divergences. We first show that beta divergences are inherently tied to Tweedie distributions, a particular type of exponential family, known as exponential dispersion models. Starting from the variance function of a Tweedie model, we outline how to get alpha and beta divergences as special cases of Csiszár’s f and Bregma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1510.05491 شماره
صفحات -
تاریخ انتشار 2015